How Does your website recommend you the right product?
A Short Answer We are living in a world of innovations which has made humans lives easier. The technology advancements have led us to become more digitized. Gone are those days where we have to plan for days to go for shopping and standing in front of a theater for tickets nor waiting for your friends or relatives to suggest you about any product. We don’t rely anymore on the word of mouth communication, these days everything is quantified with numbers and statistics. It symbolizes a remarkable story of technological innovation where everything literally everything is available with just a touch of your finger. Everything is online now e-shopping started way back in 1979 and today it has grown humongous and electronic commerce is one of the major contributors for the world economy as well as our go to option to fulfill our ever demanding needs. In the internet world due to the technological advancements there has been a huge inflow of the data and numerous products have been added in the catalog which extends our horizon in selecting the products. This is when recommender systems play a major role in recommending products to everyone. Recommender systems apply statistical and mathematical knowledge techniques while making product recommendations during a customer interaction online and they have achieved a great success in E-Commerce nowadays. Recommender systems, like many other search systems, have two types of errors: false negatives, which are merchandises that are not recommended, though the customer would like them, and false positives, which are merchandises that are recommended, though the customer does not like them. In the E-commerce domain the most important errors to avoid are false positives, since these errors will lead to angry customers, and since there are usually many merchandises products on an E-commerce site that a consumer will like to purchase, so there is no valid reason to risk recommending one customers which they will not like. For some reason these two challenges conflict with one another, since the less time an algorithm spends looking for the right neighbors, the more scalable it can be, and the worse its quality of recommendation. For this reason, it is very much important to treat these two challenges together so the solutions used are both useful and practical. Recommender systems typically provides a number of recommendations in two ways through content-based filtering or collaborative filtering. Collaborative filtering approaches building a recommendation model from a user's past purchases and their ratings given to the purchased products and similar decisions made by other customers. This model is then used to predict items or products from the catalog that the user may be interested in. Content-based filtering approaches is more specific to items it utilize a series of discrete characteristics of an item present in the catalog, in order to recommend additional items with similar characteristics. When both of these approaches are combined the resultant approach is Hybrid Recommender Systems. Collaborative filtering is widely used recommender system. It is based on analyzing a large amount of data on users’ interest, activities or choices and then predicting what users might like based on their similar interests as that of other users. A key advantage of using collaborative filtering design is that it does not rely on any machine analyzable content and thus it is capable of recommending complex items accurately thus design can be used in sectors like movies without requiring an understanding of the item itself. This method is based on the assumption that users who agreed in the past recommendation will agree for the future or current recommendation. Collaborative filtering is classified into user based and item based as shown in Fig 1. Collaborative is similar 'wisdom of crowds', this recommendation shows viewers what products is trending currently that is related to the item viewed. Under this method there are several algorithms some of them are very effective and widely used. To quantify the results, we will be using following algorithms Traditional data mining, Nearest neighbor collaborative filtering and Pure item based algorithm. Hybrid algorithm is being used to overcome the problems in these algorithms. Collaborative filtering and content based filtering have problems when it comes to scaling a larger set of data. In this modern world the data will be accumulated in large number and in coming days we might face the issue of efficiency so a very complex algorithm called Dimensionality reduction is also being implemented to overcome the common problems in recommendation system such as Scalability, Sparsity and Synonymy problem considering the larger data set. Haven said that “Man is by nature a social animal; an individual who is unsocial naturally and not accidentally is either beneath our notice or more than human. Society is something that precedes the individual. Anyone who either cannot lead the common life or is so self-sufficient as not to need to, and therefore does not partake of society, is either a beast or a god” as stated by Aristotle 1. Recommender systems also help in building our social networks or common relations among people who share alike interests, activities, backgrounds or real-life connections. Social networking sites allow users to share ideas, pictures, posts, activities, events, and interests with people in their network. The most common algorithm used in the Social networking is Temporal Context Aware Mixture Model. A Long Answer Content Based Filtering' Content based filtering systems are based on the profiles of users that are created at the beginning. The profiles which are created have information about the users and their likings. Taste is based on how the users have rated items in the past. In this recommendation process, the engine compares the items that were positively rated by the user with the items he poorly rated and looks for similarities. Those items will be recommended to the user that are mostly similar to the positively rated ones. ''Collaborative Filtering''' It has been very successful method in both research and practice but there is some important research to be done in overcoming two fundamental challenges for collaborative filtering recommender systems. The foremost challenge is to improve the scalability of the collaborative algorithms. These algorithms are able to filter tens of thousands of possible neighbors in real-time, but the growing demands of modern E-commerce systems are to search tens of millions of possible neighbors. The current algorithms have performance issues with individual customers for whom the site has large amounts of information. The second challenge is to ensure the quality of the recommendations for the consumers. Consumers need recommendations which they can trust on and to help them to buy products they will like. Once a consumer trusts a recommender system and purchases a product. If he does not like the product the customer will be unlikely to use the recommender system in future. User Based Collaborative Filtering In this approach users play a vital role. It is based on user to user correlations. This finds the highly correlated users and the items preferred by these users are recommended. Figure 2 represents the similarity matching used between the users. Figure 2: User Based Collaborative Filtering Item Based Collaborative Filtering In this approach similarities between the items is calculated based on the users behavior. This method recommend the items which are highly correlated. Scalability problem is resolved as this approach uses correlations between the limited number of items instead of large number of users. Figure 3 represents item based collaborative filtering Figure 3: Item Based Collaborative Filtering The most important algorithms which are used in the recommendation system are discussed below PURE ITEM BASED ALGORITHM The pure item based algorithm uses items as the vector of user rating. It basically looks into the items the user have rated, and select the k most similar items as ( i1,i2,….., ik). The similarity between the items is calculated by the various similarity metrics. The predictions are computed by looking into the most similar items. By taking the weighted average of user’s rating on the similar items, predictions are computed ITEM SIMILARITY COMPUTATION The most important step in the item based algorithm is to compute the similarity between the items. The users who have rated the items I and j are isolated and then similarity computational techniques are applied to calculate the similarity between the items. Figure 1 depicts this process, matrix rows are used for the users and column represent items. Figure 4 '''Item similarity matrix 1 There are many ways of calculating the similarity between the items. Here are the three factors that are used to compute the similarity between the items. These are distance based similarity metric, Cosine based similarity metric and Co-occurrence based similarity metric. '''Distance similarity metric In this distance of two item vectors is considered. If the distance between the item vectors is large then the similarity metric value will be small. The distance similarity metric is calculated as Here N= U*I ∩ U*j and U*i is the users set who have rated 〖 I〗_i . The range of 〖Sim〗_dist is from 0 to 1. Cosine similarity metric In this similarity between the item vectors, is calculated by computing the cosine of an angle between these two item vectors. The cosine of two item vectors is the distance of vector’s direction. If the direction of two item vectors is same then a value of 〖Sim〗_cos (I_i,I_j) is 1, but if the direction is opposite then 〖Sim〗_cos (I_i,I_j) is 0. Figure 1 describes that similarity between items I and j is based on the m x n ratings. 〖Sim〗_cos (I_i,I_j) is calculated as Common similarity metric - 〖Sim〗_com (I_i,I_j) is the appropriate similarity metric that measures the user’s co-occurrence behavior on the given item pair. 〖Sim〗_com (I_i,I_j) is calculated as The final similarity metric is calculated as the product of these three metrics as 〖Sim〗_CF (I_i,I_j) PREDICTION COMPUTATION After calculating the similarity between the items, the next step is to compute the predictions for the items. There are two basic techniques for computing the predictions. Weighted Sum: This method computes the prediction on the item i for user u. Predictions are calculated by computing the ratings given by user on the items similar to i. Each rating of the user is weighted by the similarity Sim (I_i,I_j) between the items I and j. This method basically tries to evaluate how the user rate the similar items. Weighted sum is scaled accurately to make sure that predictions are in predefined range. REGRESSION: This method uses an approximation of ratings based on the regression model. The approximated values of ratings are used for the e predictions. The vector for the target item is denoted by R_i and the vector for the similar items is denoted by R_N. The linear model is represented as Here ∈ is the error of the regression model. The parameters α b and β are calculated by the rating vectors. Nearest-Neighbor Collaborative filtering CF systems recommend products to customer based on the opinions of other customers. These systems employ several statistical algorithms to filter a set of consumers known as neighbors, that have a history of agreeing with the target user they find this either by their ratings given to different products similarly or they tend to buy similar set of products. Once a set of neighborhood users is formed, these systems use several different algorithms to produce recommendations. The entire process of Collaborative filtering is divided into three sub-tasks namely, representation, neighborhood formation, and recommendation generation as shown in below Figure 5. The demonstration mission deals with the scheme used to model the products that have already been bought by a customer. The neighborhood formation task emphases on the problem of how to recognize the other nearest-neighboring customers. The recommendation generation task emphases on the problem of detection the top N recommended goods from the neighborhood of customers. In the rest of the section, we describe some possible ways of performing these tasks. Figure 51: Three main parts of a Recommender System In Neighbor Collaborative filtering computing the similarity between consumers is important and it is used to form a closeness-based neighborhood between a customer and a number of like-minded customers. The neighborhood formation process is in basically the model-building or learning process for a recommender system algorithm. The main goal of neighborhood formation is to find, for each customer v, an ordered list of L customers N= {N1; N2;: : ;N} such that u ∉ N and sim(v;N1) is maximum, sim(v;N2) is the next maximum In Neighborhood formation there are two different aspects they are the proximity measure algorithm and neighborhood formation algorithm technique. Proximity Measure - The proximity between two customers is measured using either the cosine or the correlation measure. Correlation: In this proximity between two users a and b is measured by computing the Pearson correlation formula Sim (a, b), which is depicted as Where a, b: users, : rating of user a for item p, P: set of items, rated both by a and b, ,= user's average ratings Cosine: Here two customers a and b are assumed as two vectors in the n dimensional product space or the k-dimensional space in case of condensed representation. The vicinity between them is identified by calculating the cosine of the angle between the two vectors, which is given by Different Neighborhood Types. After computing the inclination between consumers, the next duty is to actually form the neighborhood set. There are various schemes for neighborhood formation. Here we discuss two schemes. Center-based scheme: It forms a neighborhood of size l, for a particular customer a, by simply selecting the c nearest other customers. Aggregate Neighborhood scheme: '''It forms a neighborhood of size L, for a customer a, by picking the closest neighbor to a. Then the rest L-1 neighbors are selected as follows. Let, at a certain point there are m neighbors in the neighborhood N, where m < l. The algorithm then calculates the centroid of the neighborhood. The centroid of N is defined as vector C and is computed. Basically this type of algorithm allows the nearest neighbors to mark the formation of the neighborhood selection and it can be favorable for very huge data. The last step involved is Generation of Recommendation here the final step of a CF recommender system is to extract the Top-N recommendations from the neighborhood customers. There are two different techniques for performing the task. '''Most-frequent Item Recommendation: It searches deep into the neighborhood N and for each neighbor it tests through his/her purchase data and executes a frequency count of the products. After all the neighbors are searched for, the system rearranges the products according to their frequency count and simply returns the N most frequent products as recommendation that have not been purchased by the current user. Association Rule-based Recommendation: '''It is based on association rule-based top-N recommendation technique here instead of using the entire number of consumers to generate set, this technique only considers only one neighbor while generating the set. By considering only a few number of neighbors may not generate strong enough association rule set, which as a consequence, may result in insufficient products to recommend. This can be improved by using a scheme where the rest of the products, if necessary, are computed by using the most frequent item algorithm. The formula used for prediction of recommendation is '''Hybrid Algorithm Hybrid recommender system combines the two techniques called collaborative filtering and content based filtering in order to achieve the best recommendation results. Several studies compare the performance of the hybrid with the pure collaborative and content-based methods and depict that the hybrid methods can provide more precise recommendations than pure approaches. This hybrid approach can also be used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem. Netflix is an example of the hybrid recommendation system. Hybrid Algorithm combines the similarity of two pure algorithms into one final hybrid similarity algorithm. Hybrid similarity metric is depicted as: Here is the strategic static combination parameter that affect the final performance of the algorithm. Static combination parameter is not efficient in the critical experiments. To improve the performance of the algorithm more precisely a new dynamic combination parameter came up. The recommendation for user u is based on the hybrid similarity with the above dynamic combination parameter. The profound metric the root mean squared error (RMSE) is used to measure the performance of the algorithm. The lesser is the value of RMSE, the better is the performance of the algorithm. Here N is the total number of rating in test set, is the predicted rating and is the real rating in the test set. Case Study: Hybrid Algorithm The three datasets with different density levels from the open''' Movie Lens are taken. Randomly 600 movies were selected. These selected 600 movies were used to filter the original Movie Lens dataset. At last randomly 1200 users were selected for each density level. For each dataset, the items which are rated by less than 3 users and users who have rated less than three movies were discarded. For data set l, 3 rating for each users were selected to generate the test set. The data set 2 and data set 3, the number of ratings for the test set are 5 and 10 respectively. Experimental results of three different datasets for static and dynamic combination parameter are shown below. The performance of the datasets show better results in case of dynamic combination parameter. '''Experimental Results Figure 6: SH and DH values for Three data set Traditional Data Mining: Association Rules ''